{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 实战Kaggle比赛：图像分类（CIFAR-10）\n",
    "\n",
    "到目前为止，我们一直在用Gluon的`data`包直接获取`NDArray`格式的图像数据集。然而，实际中的图像数据集往往是以图像文件的形式存在的。在本节中，我们将从原始的图像文件开始，一步步整理、读取并将其变换为`NDArray`格式。\n",
    "\n",
    "我们曾在[“图像增广”](image-augmentation.ipynb)一节中实验过CIFAR-10数据集。它是计算机视觉领域的一个重要数据集。现在我们将应用前面所学的知识，动手实战CIFAR-10图像分类问题的Kaggle比赛。该比赛的网页地址是 https://www.kaggle.com/c/cifar-10 。\n",
    "\n",
    "图9.16展示了该比赛的网页信息。为了便于提交结果，请先在Kaggle网站上注册账号。\n",
    "\n",
    "![CIFAR-10图像分类比赛的网页信息。比赛数据集可通过点击“Data”标签获取](../img/kaggle_cifar10.png)\n",
    "\n",
    "首先，导入比赛所需的包或模块。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "1"
    }
   },
   "outputs": [],
   "source": [
    "import d2lzh as d2l\n",
    "from mxnet import autograd, gluon, init\n",
    "from mxnet.gluon import data as gdata, loss as gloss, nn\n",
    "import os\n",
    "import pandas as pd\n",
    "import shutil\n",
    "import time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 获取和整理数据集\n",
    "\n",
    "比赛数据分为训练集和测试集。训练集包含5万张图像。测试集包含30万张图像，其中有1万张图像用来计分，其他29万张不计分的图像是为了防止人工标注测试集并提交标注结果。两个数据集中的图像格式都是png，高和宽均为32像素，并含有RGB三个通道（彩色）。图像一共涵盖10个类别，分别为飞机、汽车、鸟、猫、鹿、狗、青蛙、马、船和卡车。图9.16的左上角展示了数据集中部分飞机、汽车和鸟的图像。\n",
    "\n",
    "### 下载数据集\n",
    "\n",
    "登录Kaggle后，可以点击图9.16所示的CIFAR-10图像分类比赛网页上的“Data”标签，并分别下载训练数据集train.7z、测试数据集test.7z和训练数据集标签trainLabels.csv。\n",
    "\n",
    "\n",
    "### 解压数据集\n",
    "\n",
    "下载完训练数据集train.7z和测试数据集test.7z后需要解压缩。解压缩后，将训练数据集、测试数据集以及训练数据集标签分别存放在以下3个路径：\n",
    "\n",
    "* ../data/kaggle_cifar10/train/[1-50000].png；\n",
    "* ../data/kaggle_cifar10/test/[1-300000].png；\n",
    "* ../data/kaggle_cifar10/trainLabels.csv。\n",
    "\n",
    "为方便快速上手，我们提供了上述数据集的小规模采样，其中train_tiny.zip包含100个训练样本，而test_tiny.zip仅包含1个测试样本。它们解压后的文件夹名称分别为train_tiny和test_tiny。此外，将训练数据集标签的压缩文件解压，并得到trainLabels.csv。如果使用上述Kaggle比赛的完整数据集，还需要把下面`demo`变量改为`False`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "2"
    }
   },
   "outputs": [],
   "source": [
    "# 如果使用下载的Kaggle比赛的完整数据集，把demo变量改为False\n",
    "demo = True\n",
    "if demo:\n",
    "    import zipfile\n",
    "    for f in ['train_tiny.zip', 'test_tiny.zip', 'trainLabels.csv.zip']:\n",
    "        with zipfile.ZipFile('../data/kaggle_cifar10/' + f, 'r') as z:\n",
    "            z.extractall('../data/kaggle_cifar10/')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 整理数据集\n",
    "\n",
    "我们需要整理数据集，以方便训练和测试模型。以下的`read_label_file`函数将用来读取训练数据集的标签文件。该函数中的参数`valid_ratio`是验证集样本数与原始训练集样本数之比。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "3"
    }
   },
   "outputs": [],
   "source": [
    "def read_label_file(data_dir, label_file, train_dir, valid_ratio):\n",
    "    with open(os.path.join(data_dir, label_file), 'r') as f:\n",
    "        # 跳过文件头行（栏名称）\n",
    "        lines = f.readlines()[1:]\n",
    "        tokens = [l.rstrip().split(',') for l in lines]\n",
    "        idx_label = dict(((int(idx), label) for idx, label in tokens))\n",
    "    labels = set(idx_label.values())\n",
    "    n_train_valid = len(os.listdir(os.path.join(data_dir, train_dir)))\n",
    "    n_train = int(n_train_valid * (1 - valid_ratio))\n",
    "    assert 0 < n_train < n_train_valid\n",
    "    return n_train // len(labels), idx_label"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面定义一个辅助函数，从而仅在路径不存在的情况下创建路径。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "4"
    }
   },
   "outputs": [],
   "source": [
    "def mkdir_if_not_exist(path):  # 本函数已保存在d2lzh包中方便以后使用\n",
    "    if not os.path.exists(os.path.join(*path)):\n",
    "        os.makedirs(os.path.join(*path))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们接下来定义`reorg_train_valid`函数来从原始训练集中切分出验证集。以`valid_ratio=0.1`为例，由于原始训练集有50,000张图像，调参时将有45,000张图像用于训练并存放在路径`input_dir/train`下，而另外5,000张图像将作为验证集并存放在路径`input_dir/valid`下。经过整理后，同一类图像将被放在同一个文件夹下，便于稍后读取。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "5"
    }
   },
   "outputs": [],
   "source": [
    "def reorg_train_valid(data_dir, train_dir, input_dir, n_train_per_label,\n",
    "                      idx_label):\n",
    "    label_count = {}\n",
    "    for train_file in os.listdir(os.path.join(data_dir, train_dir)):\n",
    "        idx = int(train_file.split('.')[0])\n",
    "        label = idx_label[idx]\n",
    "        mkdir_if_not_exist([data_dir, input_dir, 'train_valid', label])\n",
    "        shutil.copy(os.path.join(data_dir, train_dir, train_file),\n",
    "                    os.path.join(data_dir, input_dir, 'train_valid', label))\n",
    "        if label not in label_count or label_count[label] < n_train_per_label:\n",
    "            mkdir_if_not_exist([data_dir, input_dir, 'train', label])\n",
    "            shutil.copy(os.path.join(data_dir, train_dir, train_file),\n",
    "                        os.path.join(data_dir, input_dir, 'train', label))\n",
    "            label_count[label] = label_count.get(label, 0) + 1\n",
    "        else:\n",
    "            mkdir_if_not_exist([data_dir, input_dir, 'valid', label])\n",
    "            shutil.copy(os.path.join(data_dir, train_dir, train_file),\n",
    "                        os.path.join(data_dir, input_dir, 'valid', label))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面的`reorg_test`函数用来整理测试集，从而方便预测时的读取。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "6"
    }
   },
   "outputs": [],
   "source": [
    "def reorg_test(data_dir, test_dir, input_dir):\n",
    "    mkdir_if_not_exist([data_dir, input_dir, 'test', 'unknown'])\n",
    "    for test_file in os.listdir(os.path.join(data_dir, test_dir)):\n",
    "        shutil.copy(os.path.join(data_dir, test_dir, test_file),\n",
    "                    os.path.join(data_dir, input_dir, 'test', 'unknown'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最后，我们用一个函数分别调用前面定义的`read_label_file`函数、`reorg_train_valid`函数以及`reorg_test`函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "7"
    }
   },
   "outputs": [],
   "source": [
    "def reorg_cifar10_data(data_dir, label_file, train_dir, test_dir, input_dir,\n",
    "                       valid_ratio):\n",
    "    n_train_per_label, idx_label = read_label_file(data_dir, label_file,\n",
    "                                                   train_dir, valid_ratio)\n",
    "    reorg_train_valid(data_dir, train_dir, input_dir, n_train_per_label,\n",
    "                      idx_label)\n",
    "    reorg_test(data_dir, test_dir, input_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们在这里只使用100个训练样本和1个测试样本。训练数据集和测试数据集的文件夹名称分别为train_tiny和test_tiny。相应地，我们仅将批量大小设为1。实际训练和测试时应使用Kaggle比赛的完整数据集，并将批量大小`batch_size`设为一个较大的整数，如128。我们将10%的训练样本作为调参使用的验证集。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "8"
    }
   },
   "outputs": [],
   "source": [
    "if demo:\n",
    "    # 注意，此处使用小训练集和小测试集并将批量大小相应设小。使用Kaggle比赛的完整数据集时可\n",
    "    # 设批量大小为较大整数\n",
    "    train_dir, test_dir, batch_size = 'train_tiny', 'test_tiny', 1\n",
    "else:\n",
    "    train_dir, test_dir, batch_size = 'train', 'test', 128\n",
    "data_dir, label_file = '../data/kaggle_cifar10', 'trainLabels.csv'\n",
    "input_dir, valid_ratio = 'train_valid_test', 0.1\n",
    "reorg_cifar10_data(data_dir, label_file, train_dir, test_dir, input_dir,\n",
    "                   valid_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 图像增广\n",
    "\n",
    "为应对过拟合，我们使用图像增广。例如，加入`transforms.RandomFlipLeftRight()`即可随机对图像做镜面翻转，也可以通过`transforms.Normalize()`对彩色图像RGB三个通道分别做标准化。下面列举了其中的部分操作，你可以根据需求来决定是否使用或修改这些操作。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "9"
    }
   },
   "outputs": [],
   "source": [
    "transform_train = gdata.vision.transforms.Compose([\n",
    "    # 将图像放大成高和宽各为40像素的正方形\n",
    "    gdata.vision.transforms.Resize(40),\n",
    "    # 随机对高和宽各为40像素的正方形图像裁剪出面积为原图像面积0.64~1倍的小正方形，再放缩为\n",
    "    # 高和宽各为32像素的正方形\n",
    "    gdata.vision.transforms.RandomResizedCrop(32, scale=(0.64, 1.0),\n",
    "                                              ratio=(1.0, 1.0)),\n",
    "    gdata.vision.transforms.RandomFlipLeftRight(),\n",
    "    gdata.vision.transforms.ToTensor(),\n",
    "    # 对图像的每个通道做标准化\n",
    "    gdata.vision.transforms.Normalize([0.4914, 0.4822, 0.4465],\n",
    "                                      [0.2023, 0.1994, 0.2010])])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "测试时，为保证输出的确定性，我们仅对图像做标准化。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "transform_test = gdata.vision.transforms.Compose([\n",
    "    gdata.vision.transforms.ToTensor(),\n",
    "    gdata.vision.transforms.Normalize([0.4914, 0.4822, 0.4465],\n",
    "                                      [0.2023, 0.1994, 0.2010])])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 读取数据集\n",
    "\n",
    "接下来，可以通过创建`ImageFolderDataset`实例来读取整理后的含原始图像文件的数据集，其中每个数据样本包括图像和标签。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "10"
    }
   },
   "outputs": [],
   "source": [
    "# 读取原始图像文件。flag=1说明输入图像有3个通道（彩色）\n",
    "train_ds = gdata.vision.ImageFolderDataset(\n",
    "    os.path.join(data_dir, input_dir, 'train'), flag=1)\n",
    "valid_ds = gdata.vision.ImageFolderDataset(\n",
    "    os.path.join(data_dir, input_dir, 'valid'), flag=1)\n",
    "train_valid_ds = gdata.vision.ImageFolderDataset(\n",
    "    os.path.join(data_dir, input_dir, 'train_valid'), flag=1)\n",
    "test_ds = gdata.vision.ImageFolderDataset(\n",
    "    os.path.join(data_dir, input_dir, 'test'), flag=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们在`DataLoader`中指明定义好的图像增广操作。在训练时，我们仅用验证集评价模型，因此需要保证输出的确定性。在预测时，我们将在训练集和验证集的并集上训练模型，以充分利用所有标注的数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_iter = gdata.DataLoader(train_ds.transform_first(transform_train),\n",
    "                              batch_size, shuffle=True, last_batch='keep')\n",
    "valid_iter = gdata.DataLoader(valid_ds.transform_first(transform_test),\n",
    "                              batch_size, shuffle=True, last_batch='keep')\n",
    "train_valid_iter = gdata.DataLoader(train_valid_ds.transform_first(\n",
    "    transform_train), batch_size, shuffle=True, last_batch='keep')\n",
    "test_iter = gdata.DataLoader(test_ds.transform_first(transform_test),\n",
    "                             batch_size, shuffle=False, last_batch='keep')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义模型\n",
    "\n",
    "与[“残差网络（ResNet）”](../chapter_convolutional-neural-networks/resnet.ipynb)一节中的实现稍有不同，这里基于`HybridBlock`类构建残差块。这是为了提升执行效率。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "11"
    }
   },
   "outputs": [],
   "source": [
    "class Residual(nn.HybridBlock):\n",
    "    def __init__(self, num_channels, use_1x1conv=False, strides=1, **kwargs):\n",
    "        super(Residual, self).__init__(**kwargs)\n",
    "        self.conv1 = nn.Conv2D(num_channels, kernel_size=3, padding=1,\n",
    "                               strides=strides)\n",
    "        self.conv2 = nn.Conv2D(num_channels, kernel_size=3, padding=1)\n",
    "        if use_1x1conv:\n",
    "            self.conv3 = nn.Conv2D(num_channels, kernel_size=1,\n",
    "                                   strides=strides)\n",
    "        else:\n",
    "            self.conv3 = None\n",
    "        self.bn1 = nn.BatchNorm()\n",
    "        self.bn2 = nn.BatchNorm()\n",
    "\n",
    "    def hybrid_forward(self, F, X):\n",
    "        Y = F.relu(self.bn1(self.conv1(X)))\n",
    "        Y = self.bn2(self.conv2(Y))\n",
    "        if self.conv3:\n",
    "            X = self.conv3(X)\n",
    "        return F.relu(Y + X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面定义ResNet-18模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "def resnet18(num_classes):\n",
    "    net = nn.HybridSequential()\n",
    "    net.add(nn.Conv2D(64, kernel_size=3, strides=1, padding=1),\n",
    "            nn.BatchNorm(), nn.Activation('relu'))\n",
    "\n",
    "    def resnet_block(num_channels, num_residuals, first_block=False):\n",
    "        blk = nn.HybridSequential()\n",
    "        for i in range(num_residuals):\n",
    "            if i == 0 and not first_block:\n",
    "                blk.add(Residual(num_channels, use_1x1conv=True, strides=2))\n",
    "            else:\n",
    "                blk.add(Residual(num_channels))\n",
    "        return blk\n",
    "\n",
    "    net.add(resnet_block(64, 2, first_block=True),\n",
    "            resnet_block(128, 2),\n",
    "            resnet_block(256, 2),\n",
    "            resnet_block(512, 2))\n",
    "    net.add(nn.GlobalAvgPool2D(), nn.Dense(num_classes))\n",
    "    return net"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CIFAR-10图像分类问题的类别个数为10。我们将在训练开始前对模型进行Xavier随机初始化。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_net(ctx):\n",
    "    num_classes = 10\n",
    "    net = resnet18(num_classes)\n",
    "    net.initialize(ctx=ctx, init=init.Xavier())\n",
    "    return net\n",
    "\n",
    "loss = gloss.SoftmaxCrossEntropyLoss()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义训练函数\n",
    "\n",
    "我们将根据模型在验证集上的表现来选择模型并调节超参数。下面定义了模型的训练函数`train`。我们记录了每个迭代周期的训练时间，这有助于比较不同模型的时间开销。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "12"
    }
   },
   "outputs": [],
   "source": [
    "def train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,\n",
    "          lr_decay):\n",
    "    trainer = gluon.Trainer(net.collect_params(), 'sgd',\n",
    "                            {'learning_rate': lr, 'momentum': 0.9, 'wd': wd})\n",
    "    for epoch in range(num_epochs):\n",
    "        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()\n",
    "        if epoch > 0 and epoch % lr_period == 0:\n",
    "            trainer.set_learning_rate(trainer.learning_rate * lr_decay)\n",
    "        for X, y in train_iter:\n",
    "            y = y.astype('float32').as_in_context(ctx)\n",
    "            with autograd.record():\n",
    "                y_hat = net(X.as_in_context(ctx))\n",
    "                l = loss(y_hat, y).sum()\n",
    "            l.backward()\n",
    "            trainer.step(batch_size)\n",
    "            train_l_sum += l.asscalar()\n",
    "            train_acc_sum += (y_hat.argmax(axis=1) == y).sum().asscalar()\n",
    "            n += y.size\n",
    "        time_s = \"time %.2f sec\" % (time.time() - start)\n",
    "        if valid_iter is not None:\n",
    "            valid_acc = d2l.evaluate_accuracy(valid_iter, net, ctx)\n",
    "            epoch_s = (\"epoch %d, loss %f, train acc %f, valid acc %f, \"\n",
    "                       % (epoch + 1, train_l_sum / n, train_acc_sum / n,\n",
    "                          valid_acc))\n",
    "        else:\n",
    "            epoch_s = (\"epoch %d, loss %f, train acc %f, \" %\n",
    "                       (epoch + 1, train_l_sum / n, train_acc_sum / n))\n",
    "        print(epoch_s + time_s + ', lr ' + str(trainer.learning_rate))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练并验证模型\n",
    "\n",
    "现在，我们可以训练并验证模型了。下面的超参数都是可以调节的，如增加迭代周期等。由于`lr_period`和`lr_decay`分别设为80和0.1，优化算法的学习率将在每80个迭代周期后自乘0.1。简单起见，这里仅训练1个迭代周期。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "13"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 1, loss 6.638845, train acc 0.044444, valid acc 0.000000, time 1.01 sec, lr 0.1\n"
     ]
    }
   ],
   "source": [
    "ctx, num_epochs, lr, wd = d2l.try_gpu(), 1, 0.1, 5e-4\n",
    "lr_period, lr_decay, net = 80, 0.1, get_net(ctx)\n",
    "net.hybridize()\n",
    "train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,\n",
    "      lr_decay)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 对测试集分类并在Kaggle提交结果\n",
    "\n",
    "得到一组满意的模型设计和超参数后，我们使用所有训练数据集（含验证集）重新训练模型，并对测试集进行分类。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "14"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 1, loss 5.878572, train acc 0.100000, time 0.92 sec, lr 0.1\n"
     ]
    }
   ],
   "source": [
    "net, preds = get_net(ctx), []\n",
    "net.hybridize()\n",
    "train(net, train_valid_iter, None, num_epochs, lr, wd, ctx, lr_period,\n",
    "      lr_decay)\n",
    "\n",
    "for X, _ in test_iter:\n",
    "    y_hat = net(X.as_in_context(ctx))\n",
    "    preds.extend(y_hat.argmax(axis=1).astype(int).asnumpy())\n",
    "sorted_ids = list(range(1, len(test_ds) + 1))\n",
    "sorted_ids.sort(key=lambda x: str(x))\n",
    "df = pd.DataFrame({'id': sorted_ids, 'label': preds})\n",
    "df['label'] = df['label'].apply(lambda x: train_valid_ds.synsets[x])\n",
    "df.to_csv('submission.csv', index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "执行完上述代码后，我们会得到一个submission.csv文件。这个文件符合Kaggle比赛要求的提交格式。提交结果的方法与[“实战Kaggle比赛：房价预测”](../chapter_deep-learning-basics/kaggle-house-price.ipynb)一节中的类似。\n",
    "\n",
    "\n",
    "## 小结\n",
    "\n",
    "* 可以通过创建`ImageFolderDataset`实例来读取含原始图像文件的数据集。\n",
    "* 可以应用卷积神经网络、图像增广和混合式编程来实战图像分类比赛。\n",
    "\n",
    "\n",
    "## 练习\n",
    "\n",
    "* 使用Kaggle比赛的完整CIFAR-10数据集。把批量大小`batch_size`和迭代周期数`num_epochs`分别改为128和300。可以在这个比赛中得到什么样的准确率和名次？\n",
    "* 如果不使用图像增广的方法能得到什么样的准确率？\n",
    "* 扫码直达讨论区，在社区交流方法和结果。你能发掘出其他更好的技巧吗？ \n",
    "\n",
    "## 扫码直达[讨论区](https://discuss.gluon.ai/t/topic/1545)\n",
    "\n",
    "![](../img/qr_kaggle-gluon-cifar10.svg)"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}